221 research outputs found

    SIMCO: SIMilarity-based object COunting

    Full text link
    We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks

    Indirect Match Highlights Detection with Deep Convolutional Neural Networks

    Full text link
    Highlights in a sport video are usually referred as actions that stimulate excitement or attract attention of the audience. A big effort is spent in designing techniques which find automatically highlights, in order to automatize the otherwise manual editing process. Most of the state-of-the-art approaches try to solve the problem by training a classifier using the information extracted on the tv-like framing of players playing on the game pitch, learning to detect game actions which are labeled by human observers according to their perception of highlight. Obviously, this is a long and expensive work. In this paper, we reverse the paradigm: instead of looking at the gameplay, inferring what could be exciting for the audience, we directly analyze the audience behavior, which we assume is triggered by events happening during the game. We apply deep 3D Convolutional Neural Network (3D-CNN) to extract visual features from cropped video recordings of the supporters that are attending the event. Outputs of the crops belonging to the same frame are then accumulated to produce a value indicating the Highlight Likelihood (HL) which is then used to discriminate between positive (i.e. when a highlight occurs) and negative samples (i.e. standard play or time-outs). Experimental results on a public dataset of ice-hockey matches demonstrate the effectiveness of our method and promote further research in this new exciting direction.Comment: "Social Signal Processing and Beyond" workshop, in conjunction with ICIAP 201

    Walking Along Curved Trajectories. Changes With Age and Parkinson's Disease. Hints to Rehabilitation

    Get PDF
    In this review, we briefly recall the fundamental processes allowing us to change locomotion trajectory and keep walking along a curved path and provide a review of contemporary literature on turning in older adults and people with Parkinson's Disease (PD). The first part briefly summarizes the way the body exploits the physical laws to produce a curved walking trajectory. Then, the changes in muscle and brain activation underpinning this task, and the promoting role of proprioception, are briefly considered. Another section is devoted to the gait changes occurring in curved walking and steering with aging. Further, freezing during turning and rehabilitation of curved walking in patients with PD is mentioned in the last part. Obviously, as the research on body steering while walking or turning has boomed in the last 10 years, the relevant critical issues have been tackled and ways to improve this locomotor task proposed. Rationale and evidences for successful training procedures are available, to potentially reduce the risk of falling in both older adults and patients with PD. A better understanding of the pathophysiology of steering, of the subtle but vital interaction between posture, balance, and progression along non-linear trajectories, and of the residual motor learning capacities in these cohorts may provide solid bases for new rehabilitative approaches

    Curved Walking Rehabilitation with a Rotating Treadmill in Patients with Parkinson’s Disease: A Proof of Concept

    Get PDF
    Training subjects to step-in-place eyes open on a rotating platform while maintaining a fixed body orientation in space [podokinetic stimulation (PKS)] produces a posteffect consisting in inadvertent turning around while stepping-in-place eyes closed [podokinetic after-rotation (PKAR)]. Since the rationale for rehabilitation of curved walking in Parkinson's disease is not fully known, we tested the hypothesis that repeated PKS favors the production of curved walking in these patients, who are uneasy with turning, even when straight walking is little affected. Fifteen patients participated in 10 training sessions distributed in 3 weeks. Both counterclockwise and clockwise PKS were randomly administered in each session. PKS velocity and duration were gradually increased over sessions. The velocity and duration of the following PKAR were assessed. All patients showed PKAR, which increased progressively in peak velocity and duration. In addition, before and at the end of the treatment, all patients walked overground along linear and circular trajectories. Post-training, the velocity of walking bouts increased, more so for the circular than the linear trajectory. Cadence was not affected. This study has shown that parkinsonian patients learn to produce turning while stepping when faced with appropriate training and that this capacity translates into improved overground curved walking

    Deep Learning methods for Fashion Multimedia Search and Retrieval

    Get PDF
    Online fashion shopping is an increasing market and with this growth comes a greater need for techniques to automate an ever-expanding variety of tasks in a more accurate way. Deep Learning techniques have been successfully applied in many tasks in the fashion domain, such as classification (to recognize different categories of clothes), recommendation (learning the preferences of a user to make suggestions), generation (automatically generate/edit clothes) etc. In this thesis we focus on search and retrieval problems in this domain. This kind of tools can speed up many tasks both on the user side and the industry side. First we start by analyzing existing models for fashion feature extraction and show their shortcomings. The analysis is made using visual summaries, a compact representation of a set of saliency maps, that describe the elements that contributed to a classification. We show that texture information is almost ignored in these models even when it should be significant for a particular style. This brings the second part, where a new kind of texture descriptor is designed, building upon texels, mid-level elements of textures that are repeated. With simple statistics on texels, interpretable attributes can be extracted and used for improving feature representations for tasks such as image retrieval and interactive search. An attribute based descriptor for textures can be plugged in a pre-existing image search framework and easily used by customers who wish to browse a textile catalog, or by designers who wish to choose fabrics for production of clothes. Navigation in this catalog leverages attributes using relative comparisons for a fast exploration of the texture space. We show the advantages of working with texels and how they can be detected using a Mask-RCNN architecture trained on the ElBa dataset, which we introduce in this thesis. It is composed of synthetic images of element-based textures, exploring a wide variety of colors, spatial patterns and shapes. In the third part a framework for Street-To-Shop matching is presented. It is an image retrieval problem where the query image is a picture that contains a clothing item and the gallery set is composed of the pictures of the clothes sold in an e-shop. The goal is to find the product in the shop most similar to the one in the picture. Compared to existing approaches, we focus on the less explored Video-To-Shop problem by extending to the time dimension, extracting information from a video sequence to improve search results even more thanks to an attention mechanism that focuses on the most salient frames. We also design a training procedure that doesn't require bounding box annotations but still yields performances higher than existing approaches that do require it. The model is trained on the MovingFashion dataset, which we also present in this thesis. This provides the user a new ways to browse an online shop, for example by taking pictures of clothes that somebody is wearing or that are seen in a physical shop, and searching for them online automatically. It has also many implications for social media marketing and market research for fashion companies

    Understanding Deep Architectures by Interpretable Visual Summaries

    No full text
    In deep learning, visualization techniques extract the salient patterns exploited by deep networks for image classification, focusing on single images; no effort has been spent in investigating whether these patterns are systematically related to precise semantic entities over multiple images belonging to a same class, thus failing to capture the very understanding of the image class the network has realized. This paper goes in this direction, presenting a visualization framework which produces a group of clusters or summaries, each one formed by crisp salient image regions focusing on a particular part that the network has exploited with high regularity to decide for a given class. The approach is based on a sparse optimization step providing sharp image saliency masks that are clustered together by means of a semantic flow similarity measure. The summaries communicate clearly what a network has exploited of a particular image class, and this is proved through automatic image tagging and with a user study. Beyond the deep network understanding, summaries are also useful for many quantitative reasons: their number is correlated with ability of a network to classify (more summaries, better performances), and they can be used to improve the classification accuracy of a network through summary-driven specializations
    • …
    corecore